ANALYSIS OF MY RUNNING WORKOUTS¶

I was having a look at my running data in Strava the other day and I happily noticed a good increase in my running volumes.

I wanted to analyse the data a bit further, but had to face the limitations of my free account...which doesn't allow downloads.

The data anlyst in me exploded with joy in front of this opportunity! I have all the workouts data available in my Apple Health account, where I also have my diabetes, nutrition and other bio measurements.

Why should I not build some custom Python code to replicate the Strava reports...and enrich them with all that bounty of biomarkers?. A wonderful opportunity for a curious diabetic to put it all together indeed: the running, the eating, the blood glucose!


In this analysis, I provide an overview of my journey coming back to running after an stress fracture on my ankle, how am I training to become and stay injury free runner for life and how running is improving my life as a type-1 diabetic.

I do so by downloading, assessing and processing the Apple Health data exported from my Apple account via the app 'HealthExport'.

I then use the Plotly library to display several visualisations, including bar charts, line charts and treemaps.

Enough talking, let's dive!

GATHERING THE DATA¶

In [1]:
# importing
import os
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

color_pal = sns.color_palette()
plt.style.use('fivethirtyeight')
In [2]:
# Define the path of the input and output folders
input_folder = './running_data'
output_folder = './running_data_transformed'

print('PREPARING APPLE HEALTH CSV EXPORTS')
print(' ')

# Get a list of all CSV files in the input folder
csv_files = [f for f in os.listdir(input_folder) if f.endswith('.csv')]

# Loop through each CSV file
for file in csv_files:
    # Construct the full path of the input file
    input_file = os.path.join(input_folder, file)
    
    # Construct the full path of the output file
    output_file = os.path.join(output_folder, file)
    
    # Check if the output file already exists, and skip if it does
    if os.path.exists(output_file):
        print(f'{output_file} already exists, skipping...')
        continue
    
    # Read in the CSV file as a dataframe, with the first column as the index
    df = pd.read_csv(os.path.join(input_folder, file))
    df.rename(columns = lambda x: x.strip().lower().replace(" ", "_"), inplace=True)

    # Apply transformations
    # Remove hh-mm-ss

    # Apply transformations
    # Remove hh-mm-ss
    df['date'] = df['date'].apply(lambda x: x.split()[0])
    # Convert to datetime
    df['date'] = pd.to_datetime(df['date'])
    # Set as index
    df = df.set_index('date')  # Set the 'date' column as the index

    # Save the transformed dataframe as a new csv file
    new_file_name = os.path.splitext(file)[0] + '_transformed.csv'  # Create a new file name
    
    # Save the dataframe as a CSV file in the output folder
    df.to_csv(os.path.join(output_folder, new_file_name), index=True) 
    
    print(f'{input_file} processed and saved to {output_file}.')

print(' ')
print('LOADING PROCESSED APPLE HEALTH CSV EXPORTS AS DATAFRAMES')
print(' ')
# Get a list of all CSV files in the folder
transformed_csv_files = [f for f in os.listdir(output_folder) if f.endswith('.csv')]

# Loop through each transformed CSV file
for file in transformed_csv_files:
    # Construct the full path of the CSV file
    file_path = os.path.join(output_folder, file)
    
    # Read the CSV file into a dataframe
    df_name = os.path.splitext(file)[0].split('_')[0]  # use the filename as the dataframe name
    globals()[df_name] = pd.read_csv(file_path)
    
    # setting date column as index in all dataframes
    globals()[df_name]['date'] = pd.to_datetime(globals()[df_name]['date'])
    globals()[df_name].set_index(['date'], inplace=True)
    
    print(f'{file_path} loaded as {df_name}.')
PREPARING APPLE HEALTH CSV EXPORTS
 
./running_data/workouts.csv processed and saved to ./running_data_transformed/workouts.csv.
./running_data/insulin.csv processed and saved to ./running_data_transformed/insulin.csv.
./running_data/energy.csv processed and saved to ./running_data_transformed/energy.csv.
./running_data/glucose2.csv processed and saved to ./running_data_transformed/glucose2.csv.
./running_data/glucose.csv processed and saved to ./running_data_transformed/glucose.csv.
 
LOADING PROCESSED APPLE HEALTH CSV EXPORTS AS DATAFRAMES
 
./running_data_transformed/insulin_transformed.csv loaded as insulin.
./running_data_transformed/workouts_transformed.csv loaded as workouts.
./running_data_transformed/glucose_transformed.csv loaded as glucose.
./running_data_transformed/energy_transformed.csv loaded as energy.
./running_data_transformed/glucose2_transformed.csv loaded as glucose2.

ASSESSING THE DATA¶

In [3]:
insulin.tail()
Out[3]:
insulin_delivery(iu) purpose
date
2023-03-04 35.0 Bolus
2023-03-04 10.0 Basal
2023-03-05 23.0 Bolus
2023-03-05 10.0 Basal
2023-03-06 3.0 Bolus
In [4]:
workouts.tail()
Out[4]:
active_energy_burned(kcal) activity distance(km) duration(s) elevation:_ascended(m) elevation:_maximum(m) elevation:_minimum(m) heart_rate_zone:_a_easy_(<115bpm)(%) heart_rate_zone:_b_fat_burn_(115-135bpm)(%) heart_rate_zone:_c_moderate_training_(135-155bpm)(%) heart_rate_zone:_d_hard_training_(155-175bpm)(%) heart_rate_zone:_e_extreme_training_(>175bpm)(%) heart_rate:_average(count/min) heart_rate:_maximum(count/min) mets_average(kcal/hr·kg) weather:_humidity(%) weather:_temperature(degc)
date
2023-03-21 59.407 Yoga NaN 748.569 NaN NaN NaN 0.937 0.063 0.000 0.0 0.0 97.928 125.0 4.444 77.0 12.77
2023-03-22 51.217 Walking 0.780 586.305 25.27 72.211 45.940 1.000 0.000 0.000 0.0 0.0 84.696 97.0 5.921 82.0 8.66
2023-03-22 295.996 Traditional Strength Training NaN 4177.796 NaN NaN NaN 0.873 0.073 0.054 0.0 0.0 94.049 151.0 2.260 82.0 8.58
2023-03-22 114.325 Walking 2.316 1666.656 25.04 71.709 45.807 0.998 0.002 0.000 0.0 0.0 74.951 116.0 3.622 81.0 9.11
2023-03-22 64.154 Walking 1.047 907.100 17.43 60.478 49.434 0.983 0.017 0.000 0.0 0.0 84.243 118.0 4.949 78.0 12.18
In [5]:
energy.tail()
Out[5]:
active_energy_burned(kcal) basal_energy_burned(kcal) carbohydrates(g) fat_saturated(g) fat_total(g) protein(g) step_count(count)
date
2023-03-18 1039.153 1544.700 578.0 9.267 38.0 106.0 7204.112
2023-03-19 1109.846 1541.463 551.0 8.101 35.0 101.0 14835.518
2023-03-20 973.931 1541.861 504.0 5.421 29.0 121.0 12956.085
2023-03-21 1140.518 1541.979 502.0 7.199 39.0 104.0 14477.000
2023-03-22 609.878 1013.088 433.0 8.930 32.0 108.0 7680.007
In [6]:
glucose.tail()
Out[6]:
blood_glucose(mg/dl) insulin_delivery(iu)
date
2023-03-22 124.0 NaN
2023-03-22 123.0 NaN
2023-03-22 125.0 NaN
2023-03-22 125.0 NaN
2023-03-22 125.0 NaN

CLEANING THE DATA¶

The bulk of cleaning of the dataframes was done while loading them. Now I will just fine-tune some details specific to the datapoints I will be soon using.

1) Aggregate daily data from CGM ✅¶

My CGM collects blood glucose data every 5 minutes, meaning that there will be around 288 measurements per day.

  • I will take the daily average, by aggregating 'blood_glucose(mg/dl)' using the 'mean'.
  • For insulin, I will sum up the total units per day, using the 'sum'.
In [7]:
# Aggregate data for workouts in same day
glucose = glucose.groupby('date').agg({'blood_glucose(mg/dl)': 'mean', 
                                       'insulin_delivery(iu)': 'sum', 
                                               })
glucose.tail()
# insulin data is 0 in March due to device issues
Out[7]:
blood_glucose(mg/dl) insulin_delivery(iu)
date
2023-03-18 138.562500 0.0
2023-03-19 140.038194 0.0
2023-03-20 157.086806 0.0
2023-03-21 135.003472 0.0
2023-03-22 152.090323 0.0

2) Convert 'duration' to minutes ✅¶

The Workout duration is expressed in seconds. I will convert it to minutes8

In [8]:
# from seconds to minutes
workouts['duration(s)']=workouts['duration(s)'].apply(lambda x:x/60)
workouts.rename({'duration(s)':'duration(m)'}, axis=1, inplace=True)

# Look for the 'duration(m)' column
workouts.tail()
Out[8]:
active_energy_burned(kcal) activity distance(km) duration(m) elevation:_ascended(m) elevation:_maximum(m) elevation:_minimum(m) heart_rate_zone:_a_easy_(<115bpm)(%) heart_rate_zone:_b_fat_burn_(115-135bpm)(%) heart_rate_zone:_c_moderate_training_(135-155bpm)(%) heart_rate_zone:_d_hard_training_(155-175bpm)(%) heart_rate_zone:_e_extreme_training_(>175bpm)(%) heart_rate:_average(count/min) heart_rate:_maximum(count/min) mets_average(kcal/hr·kg) weather:_humidity(%) weather:_temperature(degc)
date
2023-03-21 59.407 Yoga NaN 12.476150 NaN NaN NaN 0.937 0.063 0.000 0.0 0.0 97.928 125.0 4.444 77.0 12.77
2023-03-22 51.217 Walking 0.780 9.771750 25.27 72.211 45.940 1.000 0.000 0.000 0.0 0.0 84.696 97.0 5.921 82.0 8.66
2023-03-22 295.996 Traditional Strength Training NaN 69.629933 NaN NaN NaN 0.873 0.073 0.054 0.0 0.0 94.049 151.0 2.260 82.0 8.58
2023-03-22 114.325 Walking 2.316 27.777600 25.04 71.709 45.807 0.998 0.002 0.000 0.0 0.0 74.951 116.0 3.622 81.0 9.11
2023-03-22 64.154 Walking 1.047 15.118333 17.43 60.478 49.434 0.983 0.017 0.000 0.0 0.0 84.243 118.0 4.949 78.0 12.18

3) Create 'running' dataframe¶

I will subset the 'workouts' dataframe to only include Running activities, and make a 'running' dataframe out of it. Then:

  • Rename target columns and drop the unnecessary ones;
  • Aggregate data (sums and means);
  • Save the dataframe as .csv file.
In [9]:
running = workouts.query("activity=='Running'")
running.tail()
Out[9]:
active_energy_burned(kcal) activity distance(km) duration(m) elevation:_ascended(m) elevation:_maximum(m) elevation:_minimum(m) heart_rate_zone:_a_easy_(<115bpm)(%) heart_rate_zone:_b_fat_burn_(115-135bpm)(%) heart_rate_zone:_c_moderate_training_(135-155bpm)(%) heart_rate_zone:_d_hard_training_(155-175bpm)(%) heart_rate_zone:_e_extreme_training_(>175bpm)(%) heart_rate:_average(count/min) heart_rate:_maximum(count/min) mets_average(kcal/hr·kg) weather:_humidity(%) weather:_temperature(degc)
date
2023-03-10 515.472 Running 9.104 61.980817 45.27 84.313 45.149 0.017 0.111 0.659 0.213 0.0 146.439 163.0 8.710 87.0 8.07
2023-03-14 177.655 Running 3.085 23.411883 38.89 132.882 46.554 0.322 0.341 0.337 0.000 0.0 122.366 144.0 8.031 89.0 9.53
2023-03-14 219.222 Running 4.182 33.298567 4.99 88.997 44.477 0.305 0.413 0.282 0.000 0.0 123.195 146.0 7.114 91.0 9.09
2023-03-17 587.974 Running 8.818 60.425883 75.97 92.401 45.330 0.010 0.284 0.706 0.000 0.0 139.452 153.0 10.044 68.0 8.41
2023-03-21 534.337 Running 9.070 60.309700 51.23 96.909 45.033 0.000 0.054 0.401 0.545 0.0 156.176 173.0 9.190 84.0 8.66

4) Renaming target columns¶

In [10]:
running.columns
Out[10]:
Index(['active_energy_burned(kcal)', 'activity', 'distance(km)', 'duration(m)',
       'elevation:_ascended(m)', 'elevation:_maximum(m)',
       'elevation:_minimum(m)', 'heart_rate_zone:_a_easy_(<115bpm)(%)',
       'heart_rate_zone:_b_fat_burn_(115-135bpm)(%)',
       'heart_rate_zone:_c_moderate_training_(135-155bpm)(%)',
       'heart_rate_zone:_d_hard_training_(155-175bpm)(%)',
       'heart_rate_zone:_e_extreme_training_(>175bpm)(%)',
       'heart_rate:_average(count/min)', 'heart_rate:_maximum(count/min)',
       'mets_average(kcal/hr·kg)', 'weather:_humidity(%)',
       'weather:_temperature(degc)'],
      dtype='object')
In [11]:
# renaming target columns
running = running.rename({'active_energy_burned(kcal)':'kcal_burned',
                'heart_rate_zone:_a_easy_(<115bpm)(%)':'zone1_(<115bpm)(%)',
                'heart_rate_zone:_b_fat_burn_(115-135bpm)(%)':'zone2_(115-135bpm)(%)',
                'heart_rate_zone:_c_moderate_training_(135-155bpm)(%)':'zone3_(135-155bpm)(%)',
                'heart_rate_zone:_d_hard_training_(155-175bpm)(%)':'zone4_(155-175bpm)(%)',
                'heart_rate_zone:_e_extreme_training_(>175bpm)(%)':'zone5_(>175bpm)(%)',
                'heart_rate:_average(count/min)':'avg_HR', 
                'heart_rate:_maximum(count/min)': 'max_HR'}, 
               axis=1, 
               #inplace=True # commenting it out to  avoid setting value on copy of slice od dataframe
              )
# removing extra columns
running = running.drop(['elevation:_ascended(m)', 
              'elevation:_maximum(m)',
              'elevation:_minimum(m)',
              'mets_average(kcal/hr·kg)', 
              'weather:_humidity(%)',
              'weather:_temperature(degc)'], 
                       axis=1,
                       #inplace=True # commenting it out to avoid setting value on copy of slice od dataframe
                      )

5) Aggregate data for workouts in same day¶

Some days, like '2023-03-14', my Apple Watch started two activities during the same session. I will sum them up.**

In [12]:
# Aggregate data for workouts in same day
running_grouped = running.groupby('date').agg({'kcal_burned': 'sum', # sum of calories, distance and duration
                                               'distance(km)': 'sum', 
                                               'duration(m)': 'sum',
                                               'zone1_(<115bpm)(%)': 'mean', # average of the other values
                                               'zone2_(115-135bpm)(%)': 'mean',
                                               'zone3_(135-155bpm)(%)': 'mean', 
                                               'zone4_(155-175bpm)(%)': 'mean',
                                               'zone5_(>175bpm)(%)': 'mean',
                                               'avg_HR':'mean',
                                               'max_HR':'mean'})
running_grouped.tail()
Out[12]:
kcal_burned distance(km) duration(m) zone1_(<115bpm)(%) zone2_(115-135bpm)(%) zone3_(135-155bpm)(%) zone4_(155-175bpm)(%) zone5_(>175bpm)(%) avg_HR max_HR
date
2023-03-03 604.088 8.329 60.453483 0.0160 0.270 0.7140 0.000 0.0 137.3540 152.0
2023-03-10 515.472 9.104 61.980817 0.0170 0.111 0.6590 0.213 0.0 146.4390 163.0
2023-03-14 396.877 7.267 56.710450 0.3135 0.377 0.3095 0.000 0.0 122.7805 145.0
2023-03-17 587.974 8.818 60.425883 0.0100 0.284 0.7060 0.000 0.0 139.4520 153.0
2023-03-21 534.337 9.070 60.309700 0.0000 0.054 0.4010 0.545 0.0 156.1760 173.0
In [13]:
# saving the processed dataframe to the folder
running_grouped.to_csv(os.path.join(output_folder, 'running_transformed.csv'), index=True) 

VISUALISING THE DATA¶

We're ready for some dashboarding! I will divide the analysis into the following chapters

BACK AT IT - Recovery Progression: Workout's distance, calories and duration¶

I will display three simple bar-charts to show the evolution of

  • distance (in KM)
  • calories
  • duration (in minutes)

of my workouts. These will also include a line showing a 14-day rolling average of the metric, which is helpful to put the progression into a longer-term context.

HEART RATE ZONES ANALYSIS¶

Since I am in a recovery/back-from-injury period, my runs need to be at a conversational pace (Hearth rate zones 1 and 2).

WALKING MY WAY BACK¶

Running, in this phase, will still be low in volumes relatively to walking. I am using 'walking' as my true endurance builder at the moment: the perfect activity to spend a lot of time on the legs, allowing the joints to adapt and to stay in motion, without taxing the healing joints.

I will use treemaps to show the portion of time spent in the various hearth rate zones, as well as the realtive portion of running to walking.

RUNNING AND BLOOD GLUCOSE¶

I will combine my run data with my my DexcomOne (Continuous Glucose Monitor) data into a line chart to see how the two are evolving.


I will create graphs in an interactive dashboard using the Plotly library.

In [14]:
run = pd.read_csv(
    os.path.join(output_folder, 'running_transformed.csv'),
    parse_dates = ['date'],
    #infer_datetime_format=True,
    index_col = ['date']
)
run.head()
Out[14]:
kcal_burned distance(km) duration(m) zone1_(<115bpm)(%) zone2_(115-135bpm)(%) zone3_(135-155bpm)(%) zone4_(155-175bpm)(%) zone5_(>175bpm)(%) avg_HR max_HR
date
2022-11-04 36.571 0.668 4.154350 0.041 0.184 0.775 0.0 0.0 139.163 150.0
2022-11-19 35.645 0.669 4.632467 0.500 0.500 0.000 0.0 0.0 120.250 133.0
2022-11-22 40.238 0.669 4.396233 0.000 0.510 0.490 0.0 0.0 136.157 146.0
2022-12-05 38.641 0.657 4.295183 0.020 0.451 0.529 0.0 0.0 135.353 143.0
2022-12-06 37.179 0.641 3.996717 0.044 0.222 0.734 0.0 0.0 136.867 147.0

BACK AT IT - Recovery Progression: Workout's distance, calories and duration¶

Starting slow again¶

I started to run again after my injury in the second half of december 2022.

As part of my 'comeback' from recovery, during the month of december I ran twice a week.

"Running" is probably not the right word: each session lasted around 30-40 minutes, covering a distance of approximately 5 km, and always alternating walking and running intervals (example: 3 minutes running, 2 minutes walking, repeated for 8-10 times).

This bland approach allowed me to put some kilometers in my legs without stressing my healing ankle too much.

Gaining Momentum: "Running School" Program¶

From the 21 January 2023, I started the Running School's 12 weeks program (in the blue aread), ramping up my workout schedule to two 1-hour sessions a week.

The increase in distance was rapid but still part of a gentle progression.

Each running workout provided by the coaches was meant to curate different aspects of technique, such as cadence, rythm, balance and posture.

Walk-run intervals were still the core of the training, especially during the first three weeks, until the end of february.

In [15]:
#import libraries
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots

# Define list of metrics to display in graphs
metrics = [
    'distance(km)', 
    'duration(m)',
    'kcal_burned'
]

# creating a dataframe for weekly data
df2 = run[metrics].copy()
df2.index = pd.to_datetime(df2.index)
weekly_df = (
    df2
    .resample('W')         # Referring to the Week 
    .sum()                 # Summing weekly data
    .round().astype(int))  # Making it more reader-friendly


# function that will generate a Plotly chart based on the user's selection
def generate_line_chart(metric):
    """
    This function takes a metric from the run dataset.
    It generates a Plotly bar chart with the date on the x-axis and the metric on the y-axis.
    """
    df=run.copy()
    fig = px.line(weekly_df, 
                  x=weekly_df.index, 
                  y=weekly_df[metric].rolling(14, 7).mean(),  # line with 14-day moving average
                  color_discrete_sequence=['#4169E1'],
                  #title="7 day moving average"
                 )
    fig.add_trace(go.Bar(x=weekly_df.index, 
                         y=weekly_df[metric],
                        name=metric),
                 )    
    
    # Annotating the peak
    peak_index = weekly_df[metric].idxmax()
    fig.add_annotation(x=peak_index, 
                       y=weekly_df.loc[peak_index, metric], 
                       text="Peak", 
                       showarrow=True, 
                       arrowhead=1)


    # Applying new theme and adding a title
    fig.update_layout(template='ggplot2',
                      title=f"Total weekly {metric.capitalize()} - Running", 
                      xaxis_title="Date", 
                      yaxis_title=metric,
                      showlegend=True,
                      shapes=[                     # Adding shape to highlight the Running School period
                          dict(
                                type='rect',
                                xref='x',
                                yref='paper',
                                x0='2023-01-21',
                                y0=0,
                                x1='2023-03-30',
                                y1=1,
                                fillcolor='blue',
                                opacity=0.2,
                                layer='below',
                                line_width=0
                            )
                      ])



    fig.show()
    
for metric in metrics:
    generate_line_chart(metric)

The increasing intensity of my workouts during the Running School (Blue area) goes hand in hand with the calories burned in each session and with the duration of workouts.

Two 1-hour workouts a week since the end of january makes it around 2 hours of running each week. Before that day, my 'runs' lasted 30-45 minutes, at a very easy pace.

Essentially, since I started the Running School, on a weekly basis I went:

  • from 60-80 minutes to 120 total minutes of running,
  • from 6-8km to 12-16km,
  • from around 600kcal burned from running to around 1000kcal.

HEART RATE ZONES ANALYSIS¶

I have also been very cautious with the rythm of my runs, ensuring that I don't push too much and impair my body's recovery.

Monitoring my Hearth Rate Zones is crucial for this, and I try to spend roughly 80% of my workouts between Zone2 (115-135bpm), the infamous 'Conversational Pace', and Zone2 (115-155bpm)!

Let's have a look:

In [16]:
run.columns
Out[16]:
Index(['kcal_burned', 'distance(km)', 'duration(m)', 'zone1_(<115bpm)(%)',
       'zone2_(115-135bpm)(%)', 'zone3_(135-155bpm)(%)',
       'zone4_(155-175bpm)(%)', 'zone5_(>175bpm)(%)', 'avg_HR', 'max_HR'],
      dtype='object')
In [17]:
# Subsetting dataframe
hrzones = run[['zone1_(<115bpm)(%)',
       'zone2_(115-135bpm)(%)', 'zone3_(135-155bpm)(%)',
       'zone4_(155-175bpm)(%)', 'zone5_(>175bpm)(%)']].loc[run.index>'2023-01-01'].mean()

avg_hr = pd.DataFrame(hrzones.reset_index())
avg_hr.columns = ['HR_Zone','avg_time_in_zone']
avg_hr['avg_time_in_zone(%)'] = round((avg_hr['avg_time_in_zone']*100).astype(int))
#avg_hr
In [18]:
# Subsetting dataframe
hrzones = run[['zone1_(<115bpm)(%)',
       'zone2_(115-135bpm)(%)', 'zone3_(135-155bpm)(%)',
       'zone4_(155-175bpm)(%)', 'zone5_(>175bpm)(%)']].loc[run.index>'2023-01-01'].mean()

avg_hr = pd.DataFrame(hrzones.reset_index())
avg_hr.columns = ['HR_Zone','avg_time_in_zone']
avg_hr['avg_time_in_zone(%)'] = round((avg_hr['avg_time_in_zone']*100).astype(int))
avg_hr = avg_hr.loc[~(avg_hr['avg_time_in_zone(%)']==0)]
#avg_hr

Since I spent no time in zone5, I will drop that row to avoid issues when displaying the treemap.

In [19]:
avg_hr = avg_hr.loc[~(avg_hr['avg_time_in_zone(%)']==0)]
avg_hr
Out[19]:
HR_Zone avg_time_in_zone avg_time_in_zone(%)
0 zone1_(<115bpm)(%) 0.227636 22
1 zone2_(115-135bpm)(%) 0.415977 41
2 zone3_(135-155bpm)(%) 0.316477 31
3 zone4_(155-175bpm)(%) 0.039909 3
In [20]:
#import libraries
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots

fig = px.treemap(avg_hr, 
                 path=['HR_Zone'], 
                 values='avg_time_in_zone(%)',
                 color='avg_time_in_zone(%)', 
                 color_continuous_scale='Blues'
                )

fig.update_layout(template='ggplot2',
                  title="Time in HR zones (%)",
                  #treemapcolorway = ["blue"],
                 )
fig.update_traces(#root_color="green",
                  marker=dict(cornerradius=5),
                  labels = ["Zone 1","Zone 2", "Zone 3", "Zone 4", "Zone 5"],
                  values = list(avg_hr['avg_time_in_zone(%)']),
                  textinfo = "label+value"
                 )
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()

And indeed, I am spending most of my running time in Zone 2 (41%) and Zone 1 (22%), which combined make a good 63% of running done at a conversational pace...perfect for a recovery!

Extra: Average HR before and during the Running School.¶

Based on the new colun 'is_RS': 'N' means it was not a RS day, 'Y' means it was!

(I could have done this earlier, before creating the bar charts. I kept the function as it was: one problem, two approaches to solve it!)

In [21]:
import numpy as np
run['is_RS'] = np.where(run.index<'2023-01-21', 'N', 'Y')

# Subsetting dataframe
hrzones_RS = run[['zone1_(<115bpm)(%)',
       'zone2_(115-135bpm)(%)', 'zone3_(135-155bpm)(%)',
       'zone4_(155-175bpm)(%)', 'zone5_(>175bpm)(%)', 'is_RS']].loc[run['is_RS']=='Y'].mean()
/var/folders/0l/7mflm_6s5rb9r18y6bzvc_100000gn/T/ipykernel_42838/2654370482.py:7: FutureWarning:

Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError.  Select only valid columns before calling the reduction.

In [22]:
import plotly.graph_objs as go
fig = go.Figure()

fig.add_trace(go.Scatter(
   x=run.loc[run['is_RS']=='N'].index,
   y=run.loc[run['is_RS']=='N']['avg_HR'],
    mode='lines',
   name="Before Running School")
)

fig.add_trace(go.Scatter(
   x=run.loc[run['is_RS']=='Y'].index,
   y=run.loc[run['is_RS']=='Y']['avg_HR'],
    mode='lines',
   name="During Running School")
)

# Annotating the peak
peak_index = run['avg_HR'].idxmax()
fig.add_annotation(x=peak_index, 
                   y=run.loc[peak_index, 'avg_HR'], 
                   text="Peak", 
                   showarrow=True, 
                   arrowhead=1)

# Before and After RS
fig.add_shape(type='line', x0='2023-01-21', y0=0, x1='2023-01-21', y1=max(run['max_HR'])-5,
              line=dict(color='black', width=3, dash='dash'))
# Add an annotation on the vertical line
fig.add_annotation(x='2023-01-21', y=max((run['max_HR'])), 
                   text='Start of Running School', 
                   showarrow=True, arrowhead=1
                  )


# Applying new theme and adding a title
fig.update_layout(template='ggplot2',
                  title=f"Average Heart Rate", 
                  xaxis_title="Date", 
                  yaxis_title='avg_HR'
                 )

fig.show()

WALKING MY WAY BACK¶

Walking is the most important!¶

Spending time on the legs, bradly speaking, is the best way to recover them and keep them trained in a way that is safe and not conducive to injury.

The 'couch to 5K' protocol sounds great, but when the body is not used to being and staying in motion, that could be dangerous. In fact, that is how injuries come from: one never runs nor walks, stays sit the whole day, barely works the joints. Then one goes for a run, suddenly increases the miles on a body not yet fit for the effort, and voilà!, injury!

I learned this the hard way, and that's why I just walk all the time. Grocery shopping, office commutes, chores of any kind...you name it! Unless the distance is prohibitive or have some time constraints, I'll walk my way there.

How much walking are we talking about? Let's have a look!

Analysis process¶

  1. Build a dataframe of only cardio activities (from the main 'workouts' dataframe, I will include 'Walking', 'Running', 'Running (Indoor)', 'Cycling', 'Cycling (Indoor)')
  2. Sum up the total minutes for each activity (data is for the last 3 months)
  3. Display data on a treemap
In [23]:
# Summary of cardio activities over the last three months
cardio_activities = ['Walking', 'Running', 'Running (Indoor)', 'Cycling', 'Cycling (Indoor)']
cardio_df = workouts.loc[workouts["activity"].isin(cardio_activities)][['activity','active_energy_burned(kcal)','duration(m)' ]]
cardio_df = cardio_df.groupby('activity').sum().reset_index().sort_values(by=['duration(m)'], ascending=True)
In [24]:
cardio_df["%_of_total_time"]=(cardio_df['duration(m)'] / cardio_df['duration(m)'].sum())*100
In [25]:
cardio_df = cardio_df.round()

# Combining cycling and running data into one row for each

## Rename activity
cardio_df['activity'] = cardio_df['activity'].replace({'Cycling (Indoor)': 'Cycling',
                                                      'Running (Indoor)': 'Running'})

## Groupby and sum activity
cardio_df = cardio_df.groupby(['activity']).sum().reset_index()
cardio_df
Out[25]:
activity active_energy_burned(kcal) duration(m) %_of_total_time
0 Cycling 21878.0 2510.0 20.0
1 Running 11643.0 1473.0 12.0
2 Walking 34127.0 8554.0 68.0
In [26]:
fig = px.treemap(cardio_df, 
                 path=['activity'], 
                 values='%_of_total_time',
                 color='%_of_total_time', 
                 color_continuous_scale='Blues'
                )

fig.update_layout(template='ggplot2',
                  title="Total Time on Legs (by activity type)",
                  #treemapcolorway = ["blue"],
                 )
fig.update_traces(root_color="green",
                  marker=dict(cornerradius=5),
                  labels = ['Cycling', 'Running', 'Walking'],
                  values = list(cardio_df['%_of_total_time']),
                  textinfo = "label+value"
                 )

fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()

As you can see, although running workouts have been and are increasing in frequency and distance, I spend close to 70% of my time just walking! That's where endurance is built, by keeping the legs, the joints and the whole body constantly active!

I have learned my lesson¶

My past injuries (knee pain and stress fracture on the ankles) had one clear cause: one day in 2019 I decided I would run 10km everyday. I almost never ran with that frequency, nor was I used to the distance. Alhtough I could handle the workouts, my body only lasted one month, then I started to tear apart.

You don't become a runner just because one day you decide you run, out of the blue. You become a runner by designing your life in a way that is conducive to maintain a healthy body, and in turn to allow running.

Randomly standing up from the couch and lacing up the shoes may be good to start and find some initial motivation, but running in the long term requires that our body is able to handle the effort.

That is why I walk so much (and do strenght training and joint training!). Stay ready so you don't have to get ready!

RUNNING AND BLOOD GLUCOSE¶

Is my daily average blood glucose getting better, as I am increasing the lenght and intensity of my running workouts?

Let's have a closer look, bringing to the graph the data coming from my DexcomOne (Continuous Glucose Monitor).

In [27]:
fig = go.Figure()

# add the blood_glucose line trace to the plot
fig.add_trace(
    go.Scatter(x=glucose.index, 
               y=glucose['blood_glucose(mg/dl)'].rolling(7, 1).mean(), # 7-day moving average
               mode='lines', 
               name='Average blood glucose', 
               yaxis="y1",  # first y-axis

               )
)

# add the distance line trace to the plot
fig.add_trace(
    go.Scatter(x=run.index, 
               y=run['distance(km)'].rolling(7, 1).mean(), # 7-day moving average
               mode='lines', 
               name='Total weekly distance',
               yaxis="y2",  # second y-axis
               )
)

# Before and After RS
fig.add_shape(type='line', x0='2023-01-21', y0=0, x1='2023-01-21', y1=max(glucose['blood_glucose(mg/dl)'])-10,
              line=dict(color='black', width=3, dash='dash'))
# Annotating the RS
fig.add_annotation(x='2023-01-21', y=max(glucose['blood_glucose(mg/dl)']), 
                   text='Start of Running School', 
                   showarrow=True, arrowhead=1
                  )


# show both y-axes in their own y-scale
fig.update_layout(
    template='ggplot2',
    title='Blood glucose vs Running Distance',
    xaxis_title='Week', 
    yaxis=dict(
        title='blood_glucose(mg/dl)',
        #titlefont=dict(color='blue'),
        #tickfont=dict(color='blue')
    ),
    yaxis2=dict(
        title='distance(km)',
        #titlefont=dict(color='red'),
        #tickfont=dict(color='red'),
        overlaying='y',
        side='right'
    )
)

# show the plot
fig.show()

Interesting! As my running increased (shown by the increased in weekly kilometers I ran), the average daily blood glucose decreased (red line). That's the best thing that has emerged from this analysis!

I am writing more about my plant-based nutrition as a type-1 diabetic endurance athlete. If you're curious, you can have a look here.

See you in the next dashboard! 🌱🍌💪🏻🏃🏻‍♂️¶